{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 使用Yi的API构建RAG(LangChain)\n",
    "\n",
    "在本教程中，我们将学习如何使用LangChain和Yi的API构建一个RAG（Retrieval-Augmented Generation）系统。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 安装必要的库\n",
    "\n",
    "首先，我们需要安装LangChain库。"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "!pip install langchain"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 配置环境变量\n",
    "\n",
    "接下来，我们需要配置LangSmith和Yi的API密钥。请确保您已经在[LangSmith](https://smith.langchain.com/)和零一万物[开放平台](https://platform.lingyiwanwu.com/apikeys)注册并获取了API密钥。"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "import getpass\n",
    "import os\n",
    "\n",
    "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
    "os.environ[\"LANGCHAIN_API_KEY\"] = \"your_langsmith_api_key\"\n",
    "os.environ[\"YI_API_KEY\"] = \"your_yi_api_key\""
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 安装LangChain-OpenAI库\n",
    "\n",
    "我们还需要安装`langchain-openai`库。"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "!pip install -qU langchain-openai"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 配置LLM\n",
    "\n",
    "现在我们来配置LLM，使用Yi的大型语言模型。"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "llm = ChatOpenAI(\n",
    "    base_url=\"https://api.lingyiwanwu.com/v1\",\n",
    "    api_key=os.environ[\"YI_API_KEY\"],\n",
    "    model=\"yi-large\",\n",
    ")"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 加载数据\n",
    "\n",
    "接下来我们将加载一些示例数据。这里我们使用了LangChain的WebBaseLoader来加载网页数据。"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "import bs4\n",
    "from langchain_community.document_loaders import WebBaseLoader\n",
    "\n",
    "loader = WebBaseLoader(\n",
    "    web_paths=(\"https://lilianweng.github.io/posts/2023-06-23-agent/\",),\n",
    "    bs_kwargs=dict(\n",
    "        parse_only=bs4.SoupStrainer(\n",
    "            class_=(\"post-content\", \"post-title\", \"post-header\")\n",
    "        )\n",
    "    ),\n",
    ")\n",
    "docs = loader.load()"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 构建向量数据库\n",
    "\n",
    "我们将使用HuggingFace的嵌入模型来构建向量数据库，并使用Chroma来存储向量。"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from langchain.embeddings import HuggingFaceEmbeddings\n",
    "from langchain_chroma import Chroma\n",
    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
    "\n",
    "# 加载向量模型\n",
    "embedding = HuggingFaceEmbeddings(model_name=\"BAAI/bge-base-en-v1.5\")\n",
    "\n",
    "# 切分文档\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
    "splits = text_splitter.split_documents(docs)\n",
    "\n",
    "# 构建向量数据库\n",
    "vectorstore = Chroma.from_documents(documents=splits, embedding=embedding)\n",
    "retriever = vectorstore.as_retriever()"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 构建RAG链\n",
    "\n",
    "最后，我们将构建RAG链，并使用LangChain的hub来获取提示。"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from langchain import hub\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.runnables import RunnablePassthrough\n",
    "\n",
    "prompt = hub.pull(\"rlm/rag-prompt\")\n",
    "\n",
    "def format_docs(docs):\n",
    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
    "\n",
    "rag_chain = (\n",
    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
    "    | prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")\n",
    "\n",
    "response = rag_chain.invoke(\"What is Task Decomposition?\")\n",
    "print(response)"
   ],
   "execution_count": null,
   "outputs": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
